Overview

Dataset statistics

Number of variables12
Number of observations25074
Missing cells153701
Missing cells (%)51.1%
Duplicate rows38
Duplicate rows (%)0.2%
Total size in memory2.3 MiB
Average record size in memory96.0 B

Variable types

Text11
Categorical1

Alerts

Dataset has 38 (0.2%) duplicate rowsDuplicates
surname has 5915 (23.6%) missing valuesMissing
occupation has 8896 (35.5%) missing valuesMissing
age has 8639 (34.5%) missing valuesMissing
civil_status has 14370 (57.3%) missing valuesMissing
nationality has 11760 (46.9%) missing valuesMissing
surname_household has 19434 (77.5%) missing valuesMissing
link has 4339 (17.3%) missing valuesMissing
birth_date has 17730 (70.7%) missing valuesMissing
lob has 15839 (63.2%) missing valuesMissing
employer has 22163 (88.4%) missing valuesMissing
observation has 24472 (97.6%) missing valuesMissing

Reproduction

Analysis started2024-04-12 20:29:44.698208
Analysis finished2024-04-12 20:29:46.714902
Duration2.02 seconds
Software versionydata-profiling vv4.7.0
Download configurationconfig.json

Variables

surname
Text

MISSING 

Distinct8120
Distinct (%)42.4%
Missing5915
Missing (%)23.6%
Memory size196.0 KiB
2024-04-12T20:29:46.878795image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length46
Median length28
Mean length7.1669189
Min length2

Characters and Unicode

Total characters137311
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5034 ?
Unique (%)26.3%

Sample

1st rowBreton
2nd rowVignat
3rd rowHouy
4th rowViolet
5th rowApelmeau
ValueCountFrequency (%)
idem 685
 
3.3%
le 225
 
1.1%
femme 147
 
0.7%
fe 146
 
0.7%
martin 101
 
0.5%
de 68
 
0.3%
roux 57
 
0.3%
faure 52
 
0.3%
née 47
 
0.2%
fme 46
 
0.2%
Other values (7847) 19124
92.4%
2024-04-12T20:29:47.406976image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 16174
 
11.8%
a 12396
 
9.0%
r 11917
 
8.7%
u 9218
 
6.7%
i 9124
 
6.6%
o 8730
 
6.4%
n 8426
 
6.1%
l 6263
 
4.6%
t 6222
 
4.5%
d 4292
 
3.1%
Other values (63) 44549
32.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 137311
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 16174
 
11.8%
a 12396
 
9.0%
r 11917
 
8.7%
u 9218
 
6.7%
i 9124
 
6.6%
o 8730
 
6.4%
n 8426
 
6.1%
l 6263
 
4.6%
t 6222
 
4.5%
d 4292
 
3.1%
Other values (63) 44549
32.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 137311
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 16174
 
11.8%
a 12396
 
9.0%
r 11917
 
8.7%
u 9218
 
6.7%
i 9124
 
6.6%
o 8730
 
6.4%
n 8426
 
6.1%
l 6263
 
4.6%
t 6222
 
4.5%
d 4292
 
3.1%
Other values (63) 44549
32.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 137311
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 16174
 
11.8%
a 12396
 
9.0%
r 11917
 
8.7%
u 9218
 
6.7%
i 9124
 
6.6%
o 8730
 
6.4%
n 8426
 
6.1%
l 6263
 
4.6%
t 6222
 
4.5%
d 4292
 
3.1%
Other values (63) 44549
32.4%
Distinct2456
Distinct (%)9.9%
Missing144
Missing (%)0.6%
Memory size196.0 KiB
2024-04-12T20:29:47.810640image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length30
Median length27
Mean length6.994986
Min length1

Characters and Unicode

Total characters174385
Distinct characters68
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1697 ?
Unique (%)6.8%

Sample

1st rowCyrille
2nd rowZélie
3rd rowCaroline
4th rowEsther
5th rowThérèse
ValueCountFrequency (%)
marie 3721
 
13.7%
jean 1792
 
6.6%
pierre 1046
 
3.8%
jeanne 851
 
3.1%
louis 803
 
2.9%
françois 636
 
2.3%
louise 614
 
2.3%
anne 583
 
2.1%
joseph 466
 
1.7%
antoine 451
 
1.7%
Other values (1480) 16269
59.7%
2024-04-12T20:29:48.593737image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 28927
16.6%
i 17828
 
10.2%
n 15939
 
9.1%
a 14436
 
8.3%
r 13537
 
7.8%
o 7232
 
4.1%
s 7158
 
4.1%
t 6600
 
3.8%
l 6593
 
3.8%
u 6079
 
3.5%
Other values (58) 50056
28.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 174385
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 28927
16.6%
i 17828
 
10.2%
n 15939
 
9.1%
a 14436
 
8.3%
r 13537
 
7.8%
o 7232
 
4.1%
s 7158
 
4.1%
t 6600
 
3.8%
l 6593
 
3.8%
u 6079
 
3.5%
Other values (58) 50056
28.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 174385
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 28927
16.6%
i 17828
 
10.2%
n 15939
 
9.1%
a 14436
 
8.3%
r 13537
 
7.8%
o 7232
 
4.1%
s 7158
 
4.1%
t 6600
 
3.8%
l 6593
 
3.8%
u 6079
 
3.5%
Other values (58) 50056
28.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 174385
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 28927
16.6%
i 17828
 
10.2%
n 15939
 
9.1%
a 14436
 
8.3%
r 13537
 
7.8%
o 7232
 
4.1%
s 7158
 
4.1%
t 6600
 
3.8%
l 6593
 
3.8%
u 6079
 
3.5%
Other values (58) 50056
28.7%

occupation
Text

MISSING 

Distinct2056
Distinct (%)12.7%
Missing8896
Missing (%)35.5%
Memory size196.0 KiB
2024-04-12T20:29:48.965097image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length48
Median length43
Mean length7.9103103
Min length1

Characters and Unicode

Total characters127973
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1386 ?
Unique (%)8.6%

Sample

1st rowmenuisier
2nd rowprop re
3rd rowdomestique
4th rowfe de chambre
5th rowdomestique
ValueCountFrequency (%)
idem 3702
 
18.2%
cultivateur 1202
 
5.9%
néant 922
 
4.5%
s.p 656
 
3.2%
sans 615
 
3.0%
cult 549
 
2.7%
domestique 479
 
2.3%
de 470
 
2.3%
journalier 450
 
2.2%
sp 447
 
2.2%
Other values (1341) 10894
53.4%
2024-04-12T20:29:49.652182image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 16422
12.8%
i 12728
 
9.9%
r 12157
 
9.5%
t 8179
 
6.4%
u 7687
 
6.0%
a 7608
 
5.9%
n 7592
 
5.9%
m 6403
 
5.0%
s 5860
 
4.6%
d 5714
 
4.5%
Other values (63) 37623
29.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 127973
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 16422
12.8%
i 12728
 
9.9%
r 12157
 
9.5%
t 8179
 
6.4%
u 7687
 
6.0%
a 7608
 
5.9%
n 7592
 
5.9%
m 6403
 
5.0%
s 5860
 
4.6%
d 5714
 
4.5%
Other values (63) 37623
29.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 127973
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 16422
12.8%
i 12728
 
9.9%
r 12157
 
9.5%
t 8179
 
6.4%
u 7687
 
6.0%
a 7608
 
5.9%
n 7592
 
5.9%
m 6403
 
5.0%
s 5860
 
4.6%
d 5714
 
4.5%
Other values (63) 37623
29.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 127973
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 16422
12.8%
i 12728
 
9.9%
r 12157
 
9.5%
t 8179
 
6.4%
u 7687
 
6.0%
a 7608
 
5.9%
n 7592
 
5.9%
m 6403
 
5.0%
s 5860
 
4.6%
d 5714
 
4.5%
Other values (63) 37623
29.4%

age
Text

MISSING 

Distinct253
Distinct (%)1.5%
Missing8639
Missing (%)34.5%
Memory size196.0 KiB
2024-04-12T20:29:50.046201image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length14
Median length2
Mean length1.9643444
Min length1

Characters and Unicode

Total characters32284
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique90 ?
Unique (%)0.5%

Sample

1st row25
2nd row30
3rd row24
4th row24
5th row49
ValueCountFrequency (%)
2 367
 
2.2%
6 358
 
2.1%
8 354
 
2.1%
18 343
 
2.0%
4 341
 
2.0%
5 340
 
2.0%
7 335
 
2.0%
3 335
 
2.0%
9 326
 
1.9%
30 326
 
1.9%
Other values (128) 13513
79.8%
2024-04-12T20:29:50.588539image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 4496
13.9%
2 4255
13.2%
3 3923
12.2%
4 3629
11.2%
5 3319
10.3%
6 2891
9.0%
7 2209
6.8%
0 1935
6.0%
8 1817
5.6%
9 1442
 
4.5%
Other values (21) 2368
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 32284
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 4496
13.9%
2 4255
13.2%
3 3923
12.2%
4 3629
11.2%
5 3319
10.3%
6 2891
9.0%
7 2209
6.8%
0 1935
6.0%
8 1817
5.6%
9 1442
 
4.5%
Other values (21) 2368
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 32284
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 4496
13.9%
2 4255
13.2%
3 3923
12.2%
4 3629
11.2%
5 3319
10.3%
6 2891
9.0%
7 2209
6.8%
0 1935
6.0%
8 1817
5.6%
9 1442
 
4.5%
Other values (21) 2368
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 32284
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 4496
13.9%
2 4255
13.2%
3 3923
12.2%
4 3629
11.2%
5 3319
10.3%
6 2891
9.0%
7 2209
6.8%
0 1935
6.0%
8 1817
5.6%
9 1442
 
4.5%
Other values (21) 2368
7.3%

civil_status
Categorical

MISSING 

Distinct6
Distinct (%)0.1%
Missing14370
Missing (%)57.3%
Memory size196.0 KiB
Garçon
2824 
Fille
2823 
Homme marié
2140 
Femme mariée
2113 
Veuve
512 

Length

Max length12
Median length11
Mean length7.8179185
Min length4

Characters and Unicode

Total characters83683
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGarçon
2nd rowFille
3rd rowFille
4th rowFemme mariée
5th rowFemme mariée

Common Values

ValueCountFrequency (%)
Garçon 2824
 
11.3%
Fille 2823
 
11.3%
Homme marié 2140
 
8.5%
Femme mariée 2113
 
8.4%
Veuve 512
 
2.0%
Veuf 292
 
1.2%
(Missing) 14370
57.3%

Length

2024-04-12T20:29:50.821602image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-04-12T20:29:51.173903image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
garçon 2824
18.9%
fille 2823
18.9%
homme 2140
14.3%
marié 2140
14.3%
femme 2113
14.1%
mariée 2113
14.1%
veuve 512
 
3.4%
veuf 292
 
2.0%

Most occurring characters

ValueCountFrequency (%)
m 12759
15.2%
e 12618
15.1%
r 7077
8.5%
a 7077
8.5%
i 7076
8.5%
l 5646
6.7%
o 4964
 
5.9%
F 4936
 
5.9%
é 4253
 
5.1%
4253
 
5.1%
Other values (8) 13024
15.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 83683
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
m 12759
15.2%
e 12618
15.1%
r 7077
8.5%
a 7077
8.5%
i 7076
8.5%
l 5646
6.7%
o 4964
 
5.9%
F 4936
 
5.9%
é 4253
 
5.1%
4253
 
5.1%
Other values (8) 13024
15.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 83683
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
m 12759
15.2%
e 12618
15.1%
r 7077
8.5%
a 7077
8.5%
i 7076
8.5%
l 5646
6.7%
o 4964
 
5.9%
F 4936
 
5.9%
é 4253
 
5.1%
4253
 
5.1%
Other values (8) 13024
15.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 83683
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
m 12759
15.2%
e 12618
15.1%
r 7077
8.5%
a 7077
8.5%
i 7076
8.5%
l 5646
6.7%
o 4964
 
5.9%
F 4936
 
5.9%
é 4253
 
5.1%
4253
 
5.1%
Other values (8) 13024
15.6%

nationality
Text

MISSING 

Distinct73
Distinct (%)0.5%
Missing11760
Missing (%)46.9%
Memory size196.0 KiB
2024-04-12T20:29:51.365921image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length31
Median length9
Mean length7.2859396
Min length1

Characters and Unicode

Total characters97005
Distinct characters48
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40 ?
Unique (%)0.3%

Sample

1st rowfrançaise
2nd rowfrançaise
3rd rowfrançaise
4th rowfrançaise
5th rowfrançaise
ValueCountFrequency (%)
française 8017
60.1%
idem 4454
33.4%
français 377
 
2.8%
francaise 277
 
2.1%
polonaise 53
 
0.4%
id 23
 
0.2%
espagnole 15
 
0.1%
belge 14
 
0.1%
polonais 9
 
0.1%
portugaise 8
 
0.1%
Other values (58) 90
 
0.7%
2024-04-12T20:29:51.820967image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 17495
18.0%
i 13268
13.7%
e 12946
13.3%
s 8802
9.1%
n 8797
9.1%
r 8718
9.0%
f 8576
8.8%
ç 8395
8.7%
d 4493
 
4.6%
m 4474
 
4.6%
Other values (38) 1041
 
1.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 97005
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 17495
18.0%
i 13268
13.7%
e 12946
13.3%
s 8802
9.1%
n 8797
9.1%
r 8718
9.0%
f 8576
8.8%
ç 8395
8.7%
d 4493
 
4.6%
m 4474
 
4.6%
Other values (38) 1041
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 97005
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 17495
18.0%
i 13268
13.7%
e 12946
13.3%
s 8802
9.1%
n 8797
9.1%
r 8718
9.0%
f 8576
8.8%
ç 8395
8.7%
d 4493
 
4.6%
m 4474
 
4.6%
Other values (38) 1041
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 97005
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 17495
18.0%
i 13268
13.7%
e 12946
13.3%
s 8802
9.1%
n 8797
9.1%
r 8718
9.0%
f 8576
8.8%
ç 8395
8.7%
d 4493
 
4.6%
m 4474
 
4.6%
Other values (38) 1041
 
1.1%

surname_household
Text

MISSING 

Distinct4126
Distinct (%)73.2%
Missing19434
Missing (%)77.5%
Memory size196.0 KiB
2024-04-12T20:29:52.191262image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length52
Median length32
Mean length7.3673759
Min length3

Characters and Unicode

Total characters41552
Distinct characters66
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3414 ?
Unique (%)60.5%

Sample

1st rowFerazzi
2nd rowMachol
3rd rowDesbois
4th rowDesbroper
5th rowAllemant
ValueCountFrequency (%)
vve 60
 
1.0%
ve 55
 
0.9%
veuve 55
 
0.9%
le 43
 
0.7%
martin 34
 
0.6%
de 26
 
0.4%
née 23
 
0.4%
thomas 18
 
0.3%
faure 16
 
0.3%
roux 16
 
0.3%
Other values (4175) 5830
94.4%
2024-04-12T20:29:52.796299image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 4814
 
11.6%
a 3864
 
9.3%
r 3539
 
8.5%
u 2812
 
6.8%
o 2651
 
6.4%
i 2640
 
6.4%
n 2559
 
6.2%
l 2081
 
5.0%
t 1843
 
4.4%
s 1369
 
3.3%
Other values (56) 13380
32.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 41552
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 4814
 
11.6%
a 3864
 
9.3%
r 3539
 
8.5%
u 2812
 
6.8%
o 2651
 
6.4%
i 2640
 
6.4%
n 2559
 
6.2%
l 2081
 
5.0%
t 1843
 
4.4%
s 1369
 
3.3%
Other values (56) 13380
32.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 41552
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 4814
 
11.6%
a 3864
 
9.3%
r 3539
 
8.5%
u 2812
 
6.8%
o 2651
 
6.4%
i 2640
 
6.4%
n 2559
 
6.2%
l 2081
 
5.0%
t 1843
 
4.4%
s 1369
 
3.3%
Other values (56) 13380
32.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 41552
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 4814
 
11.6%
a 3864
 
9.3%
r 3539
 
8.5%
u 2812
 
6.8%
o 2651
 
6.4%
i 2640
 
6.4%
n 2559
 
6.2%
l 2081
 
5.0%
t 1843
 
4.4%
s 1369
 
3.3%
Other values (56) 13380
32.2%

link
Text

MISSING 

Distinct937
Distinct (%)4.5%
Missing4339
Missing (%)17.3%
Memory size196.0 KiB
2024-04-12T20:29:53.237727image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length48
Median length42
Mean length7.2057391
Min length1

Characters and Unicode

Total characters149411
Distinct characters73
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique573 ?
Unique (%)2.8%

Sample

1st rowsa fe
2nd rowsa fe
3rd rowle fils
4th rowle fils
5th rowle fils
ValueCountFrequency (%)
chef 4792
14.7%
fils 3314
10.2%
femme 3260
10.0%
fille 3175
9.8%
sa 2810
8.6%
leur 2264
 
7.0%
idem 2178
 
6.7%
de 1718
 
5.3%
ménage 1299
 
4.0%
épouse 936
 
2.9%
Other values (502) 6770
20.8%
2024-04-12T20:29:53.982315image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 28574
19.1%
f 15541
10.4%
l 12872
 
8.6%
11781
 
7.9%
m 11660
 
7.8%
i 10302
 
6.9%
s 9058
 
6.1%
a 5624
 
3.8%
d 5588
 
3.7%
c 5351
 
3.6%
Other values (63) 33060
22.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 149411
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 28574
19.1%
f 15541
10.4%
l 12872
 
8.6%
11781
 
7.9%
m 11660
 
7.8%
i 10302
 
6.9%
s 9058
 
6.1%
a 5624
 
3.8%
d 5588
 
3.7%
c 5351
 
3.6%
Other values (63) 33060
22.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 149411
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 28574
19.1%
f 15541
10.4%
l 12872
 
8.6%
11781
 
7.9%
m 11660
 
7.8%
i 10302
 
6.9%
s 9058
 
6.1%
a 5624
 
3.8%
d 5588
 
3.7%
c 5351
 
3.6%
Other values (63) 33060
22.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 149411
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 28574
19.1%
f 15541
10.4%
l 12872
 
8.6%
11781
 
7.9%
m 11660
 
7.8%
i 10302
 
6.9%
s 9058
 
6.1%
a 5624
 
3.8%
d 5588
 
3.7%
c 5351
 
3.6%
Other values (63) 33060
22.1%

birth_date
Text

MISSING 

Distinct158
Distinct (%)2.2%
Missing17730
Missing (%)70.7%
Memory size196.0 KiB
2024-04-12T20:29:54.388793image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length9
Median length4
Mean length3.9978214
Min length1

Characters and Unicode

Total characters29360
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique40 ?
Unique (%)0.5%

Sample

1st row1905
2nd row1908
3rd row1878
4th row1906
5th row1908
ValueCountFrequency (%)
1901 138
 
1.9%
1905 133
 
1.8%
1902 126
 
1.7%
1903 124
 
1.7%
1891 121
 
1.6%
1904 121
 
1.6%
1890 119
 
1.6%
1897 118
 
1.6%
1907 118
 
1.6%
1896 117
 
1.6%
Other values (150) 6112
83.2%
2024-04-12T20:29:54.967755image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 8674
29.5%
8 6325
21.5%
9 4476
15.2%
0 2051
 
7.0%
7 1605
 
5.5%
6 1482
 
5.0%
2 1455
 
5.0%
5 1217
 
4.1%
4 1043
 
3.6%
3 1005
 
3.4%
Other values (15) 27
 
0.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 29360
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1 8674
29.5%
8 6325
21.5%
9 4476
15.2%
0 2051
 
7.0%
7 1605
 
5.5%
6 1482
 
5.0%
2 1455
 
5.0%
5 1217
 
4.1%
4 1043
 
3.6%
3 1005
 
3.4%
Other values (15) 27
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 29360
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1 8674
29.5%
8 6325
21.5%
9 4476
15.2%
0 2051
 
7.0%
7 1605
 
5.5%
6 1482
 
5.0%
2 1455
 
5.0%
5 1217
 
4.1%
4 1043
 
3.6%
3 1005
 
3.4%
Other values (15) 27
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 29360
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1 8674
29.5%
8 6325
21.5%
9 4476
15.2%
0 2051
 
7.0%
7 1605
 
5.5%
6 1482
 
5.0%
2 1455
 
5.0%
5 1217
 
4.1%
4 1043
 
3.6%
3 1005
 
3.4%
Other values (15) 27
 
0.1%

lob
Text

MISSING 

Distinct2923
Distinct (%)31.7%
Missing15839
Missing (%)63.2%
Memory size196.0 KiB
2024-04-12T20:29:55.388714image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length37
Median length33
Mean length7.7566865
Min length1

Characters and Unicode

Total characters71633
Distinct characters83
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2293 ?
Unique (%)24.8%

Sample

1st rowSt Eloy de Gy - Cher
2nd rowidem
3rd rowChateauroux
4th rowOrléans
5th rowidem
ValueCountFrequency (%)
idem 3390
25.9%
st 581
 
4.4%
orléans 295
 
2.3%
289
 
2.2%
la 254
 
1.9%
loiret 170
 
1.3%
de 146
 
1.1%
le 139
 
1.1%
et 123
 
0.9%
paris 111
 
0.8%
Other values (2860) 7603
58.0%
2024-04-12T20:29:56.133378image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 9874
13.8%
i 6390
 
8.9%
m 4464
 
6.2%
d 4408
 
6.2%
r 4063
 
5.7%
a 4038
 
5.6%
n 3906
 
5.5%
3866
 
5.4%
l 3460
 
4.8%
o 3285
 
4.6%
Other values (73) 23879
33.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 71633
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 9874
13.8%
i 6390
 
8.9%
m 4464
 
6.2%
d 4408
 
6.2%
r 4063
 
5.7%
a 4038
 
5.6%
n 3906
 
5.5%
3866
 
5.4%
l 3460
 
4.8%
o 3285
 
4.6%
Other values (73) 23879
33.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 71633
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 9874
13.8%
i 6390
 
8.9%
m 4464
 
6.2%
d 4408
 
6.2%
r 4063
 
5.7%
a 4038
 
5.6%
n 3906
 
5.5%
3866
 
5.4%
l 3460
 
4.8%
o 3285
 
4.6%
Other values (73) 23879
33.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 71633
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 9874
13.8%
i 6390
 
8.9%
m 4464
 
6.2%
d 4408
 
6.2%
r 4063
 
5.7%
a 4038
 
5.6%
n 3906
 
5.5%
3866
 
5.4%
l 3460
 
4.8%
o 3285
 
4.6%
Other values (73) 23879
33.3%

employer
Text

MISSING 

Distinct1087
Distinct (%)37.3%
Missing22163
Missing (%)88.4%
Memory size196.0 KiB
2024-04-12T20:29:56.542278image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length52
Median length49
Mean length7.2998969
Min length1

Characters and Unicode

Total characters21250
Distinct characters78
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique902 ?
Unique (%)31.0%

Sample

1st rowDupuis
2nd rowpatron
3rd rowBourgeois
4th rowdivén
5th rowUsine d'ambere
ValueCountFrequency (%)
patron 659
 
17.2%
idem 607
 
15.8%
divers 106
 
2.8%
de 95
 
2.5%
patronne 84
 
2.2%
p 46
 
1.2%
m 46
 
1.2%
cie 31
 
0.8%
et 29
 
0.8%
po 27
 
0.7%
Other values (1200) 2105
54.9%
2024-04-12T20:29:57.259939image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 2364
 
11.1%
r 1880
 
8.8%
a 1770
 
8.3%
n 1592
 
7.5%
i 1476
 
6.9%
t 1454
 
6.8%
o 1426
 
6.7%
d 1031
 
4.9%
924
 
4.3%
p 895
 
4.2%
Other values (68) 6438
30.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 21250
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 2364
 
11.1%
r 1880
 
8.8%
a 1770
 
8.3%
n 1592
 
7.5%
i 1476
 
6.9%
t 1454
 
6.8%
o 1426
 
6.7%
d 1031
 
4.9%
924
 
4.3%
p 895
 
4.2%
Other values (68) 6438
30.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 21250
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 2364
 
11.1%
r 1880
 
8.8%
a 1770
 
8.3%
n 1592
 
7.5%
i 1476
 
6.9%
t 1454
 
6.8%
o 1426
 
6.7%
d 1031
 
4.9%
924
 
4.3%
p 895
 
4.2%
Other values (68) 6438
30.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 21250
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 2364
 
11.1%
r 1880
 
8.8%
a 1770
 
8.3%
n 1592
 
7.5%
i 1476
 
6.9%
t 1454
 
6.8%
o 1426
 
6.7%
d 1031
 
4.9%
924
 
4.3%
p 895
 
4.2%
Other values (68) 6438
30.3%

observation
Text

MISSING 

Distinct310
Distinct (%)51.5%
Missing24472
Missing (%)97.6%
Memory size196.0 KiB
2024-04-12T20:29:57.557295image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

Max length137
Median length44
Mean length9.8272425
Min length1

Characters and Unicode

Total characters5916
Distinct characters74
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique260 ?
Unique (%)43.2%

Sample

1st rowm
2nd rowm
3rd rowv
4th rowm
5th rowm
ValueCountFrequency (%)
veuve 97
 
8.6%
idem 90
 
8.0%
et 24
 
2.1%
de 24
 
2.1%
marié 23
 
2.0%
femme 19
 
1.7%
19
 
1.7%
du 19
 
1.7%
sait 18
 
1.6%
lire 18
 
1.6%
Other values (385) 779
68.9%
2024-04-12T20:29:58.115416image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 898
15.2%
528
 
8.9%
i 464
 
7.8%
a 343
 
5.8%
r 340
 
5.7%
u 317
 
5.4%
n 309
 
5.2%
v 252
 
4.3%
m 246
 
4.2%
d 244
 
4.1%
Other values (64) 1975
33.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 5916
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 898
15.2%
528
 
8.9%
i 464
 
7.8%
a 343
 
5.8%
r 340
 
5.7%
u 317
 
5.4%
n 309
 
5.2%
v 252
 
4.3%
m 246
 
4.2%
d 244
 
4.1%
Other values (64) 1975
33.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 5916
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 898
15.2%
528
 
8.9%
i 464
 
7.8%
a 343
 
5.8%
r 340
 
5.7%
u 317
 
5.4%
n 309
 
5.2%
v 252
 
4.3%
m 246
 
4.2%
d 244
 
4.1%
Other values (64) 1975
33.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 5916
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 898
15.2%
528
 
8.9%
i 464
 
7.8%
a 343
 
5.8%
r 340
 
5.7%
u 317
 
5.4%
n 309
 
5.2%
v 252
 
4.3%
m 246
 
4.2%
d 244
 
4.1%
Other values (64) 1975
33.4%

Missing values

2024-04-12T20:29:45.859715image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.
2024-04-12T20:29:46.266309image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

surnamefirstnameoccupationagecivil_statusnationalitysurname_householdlinkbirth_datelobemployerobservation
0BretonCyrillemenuisier25GarçonfrançaiseNaNNaNNaNNaNNaNNaN
1VignatZélieprop re30NaNfrançaiseNaNsa feNaNNaNNaNNaN
2HouyCarolinedomestique24FillefrançaiseNaNNaNNaNNaNNaNNaN
3VioletEstherfe de chambre24FillefrançaiseNaNNaNNaNNaNNaNNaN
4ApelmeauThérèsedomestique49Femme mariéefrançaiseNaNNaNNaNNaNNaNNaN
5de ChaumontMathildeprofess30Femme mariéefrançaiseNaNsa feNaNNaNNaNNaN
6de ChaumontGeorgesNaN11GarçonfrançaiseNaNle filsNaNNaNNaNNaN
7de ChaumontHenroNaN8GarçonfrançaiseNaNle filsNaNNaNNaNNaN
8de ChaumontGastonNaN5GarçonfrançaiseNaNle filsNaNNaNNaNNaN
9VoisinAnnedomestique24FillefrançaiseNaNNaNNaNNaNNaNNaN
surnamefirstnameoccupationagecivil_statusnationalitysurname_householdlinkbirth_datelobemployerobservation
25064NaNNaNNaNNaNNaNfrancaiseNaNNaN1867NaNNaNNaN
25065NaNNaNNaNNaNNaNfrancaiseNaNNaN1873NaNNaNNaN
25066NaNNaNNaNNaNNaNfrancaiseNaNNaN1884NaNNaNNaN
25067NaNNaNNaNNaNNaNidemNaNchef1897AyNaNNaN
25068NaNNaNNaNNaNNaNidemNaNchef1897NaNpatronNaN
25069NaNNaNNaNNaNNaNNaNThierifNaNNaNNaNNaNNaN
25070NaNNaNNaNNaNNaNNaNPainchaudNaNNaNNaNNaNNaN
25071NaNNaNNaNNaNNaNNaNGaston ve néeNaNNaNNaNNaNNaN
25072NaNNaNNaNNaNNaNNaNNaNfrère1897NaNpatronNaN
25073NaNNaNNaNNaNNaNNaNNaNNaN1914francaiseNaNNaN

Duplicate rows

Most frequently occurring

surnamefirstnameoccupationagecivil_statusnationalitysurname_householdlinkbirth_datelobemployerobservation# duplicates
1ChambonetMarieNaNNaNFilleNaNNaNfilleNaNNaNNaNNaN3
17Jaffeux filleJeanneNaNNaNFilleNaNNaNNaNNaNNaNNaNNaN3
21MailhutMarieNaNNaNFilleNaNNaNNaNNaNNaNNaNNaN3
0CampotJeanneidemNaNFilleNaNNaNsa filleNaNNaNNaNNaN2
2ChazeauxJeanNaNNaNGarçonNaNNaNNaNNaNNaNNaNNaN2
3CherleuilleMarieidemNaNFilleNaNNaNNaNNaNNaNNaNNaN2
4CoquinPierreNaNNaNGarçonNaNNaNNaNNaNNaNNaNNaN2
5CorreHélèneNaNNaNFilleNaNNaNfilleNaNNaNNaNNaN2
6CorreLouiseNaNNaNFemme mariéeNaNNaNsa femmeNaNNaNNaNfemme Foucaud2
7DixneufMarieNaN12FilleNaNNaNNaNNaNidemNaNNaN2